Assignment 2

Group L01G02

Tyler Sitchon, Ethan Cunanan, Johan Kok, Tingwei Liang and Jerry Jin

Introduction

Aim: Determine if there is a model to conveniently measure body fat percent at home

Data Description

Dataset Source:
- Sourced from BYU Human Performance Research Center
- Directed by Mark Ricard

Data Wrangling: BF Percent

Reducing Outliers

Siri equation

\[ Pct.BF = \frac{495}{density} - 450 \]

Wrangling cont.

Incorrect Results

  • Body density of the human body typically falls within the range of 0.900 to 1.100 g/cm³ (Jackson & Pollock, 2007)

Exploratory Data Analysis: Scatter plots

Exploratory Data Analysis: Correlation Analysis

Checking Assumptions: Multicollinearity

Checking Assumptions: Homoscedacity and Normality

Residual plots

Stepwise Regression and Stepwise Subset Selection

Model 1 - Bidirectional

Model 2 - Backwards

Model 3 - Forwards

Cross Validation

Cross Validation Results

Comparison of Stepwise Regression Models
Model RMSE R_squared Relative_RMSE
Bidirectional 4.367317 0.7179396 22.91053
Backward 4.433432 0.7222846 23.25736
Forward 4.428418 0.7246442 23.23106

Generalized Least Squares (GLS)

Overview of GLS

  • Purpose: GLS accounts for heteroscedasticity and Multicollinearity, providing more reliable estimates when these issues are present.

Generalised Least Squares: Performance Analysis

Performance Metrics
Statistic Value
RMSE 4.217467
Relative RMSE 22.124429
R Squared 0.739931

Bootstrapping

Overview

  • Purpose: Bootstrapping is a resampling technique that generates an empirical distribution of estimated parameters by repeatedly sampling from the original dataset.

Bootstrapping

Discussion of Results

Derived Variables

  • We would also like to investigate the effectiveness of using common measurements derived from the variables in our models

Body Mass Index (BMI) where:

\[ \text{BMI} = \frac{\text{Weight(lb)}}{\text{Height(inches)}} \cdot 703 \]

Waist-Hip Ratio (WHR) where:

\[ \text{WHR} = \frac{\text{Waist Circumference}}{\text{Hip Circumference}} \]

Comparing Model Performance

Limitations

1. Sample Representativeness Issues

2. Potential Confounding Factors

3. Assumption of Linearity

Further Directions

1. Incorporating More Nonlinear Models

2. Increasing Sample Diversity

3. Model Ensemble Techniques